noisy label distribution
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Hawaii (0.04)
- Education > Educational Setting > Online (1.00)
- Education > Educational Technology > Educational Software > Computer Based Training (0.41)
Information-theoretic Limits of Online Classification with Noisy Labels
We study online classification with general hypothesis classes where the true labels are determined by some function within the class, but are corrupted by stochastic noise, and the features are generated adversarially. Predictions are made using observed labels and noiseless features, while the performance is measured via minimax risk when comparing against labels. The noisy mechanism is modeled via a general noisy kernel that specifies, for any individual data point, a set of distributions from which the actual noisy label distribution is chosen. We show that minimax risk is characterized (up to a logarithmic factor of the hypothesis class size) by the of the noisy label distributions induced by the kernel, of other properties such as the means and variances of the noise. Our main technique is based on a novel reduction to an online comparison scheme of two hypotheses, along with a new version of Le Cam-Birgé testing suitable for online settings. Our work provides the first comprehensive characterization of noisy online classification with guarantees that apply to the while addressing noisy observations.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Hawaii (0.04)
- Education > Educational Setting > Online (1.00)
- Education > Educational Technology > Educational Software > Computer Based Training (0.41)
Information-theoretic Limits of Online Classification with Noisy Labels
We study online classification with general hypothesis classes where the true labels are determined by some function within the class, but are corrupted by unknown stochastic noise, and the features are generated adversarially. Predictions are made using observed noisy labels and noiseless features, while the performance is measured via minimax risk when comparing against true labels. The noisy mechanism is modeled via a general noisy kernel that specifies, for any individual data point, a set of distributions from which the actual noisy label distribution is chosen. We show that minimax risk is tightly characterized (up to a logarithmic factor of the hypothesis class size) by the Hellinger gap of the noisy label distributions induced by the kernel, independent of other properties such as the means and variances of the noise. Our main technique is based on a novel reduction to an online comparison scheme of two hypotheses, along with a new conditional version of Le Cam-Birgé testing suitable for online settings.
Learning from Noisy Label Distributions
In this paper, we consider a novel machine learning problem, that is, learning a classifier from noisy label distributions. In this problem, each instance with a feature vector belongs to at least one group. Then, instead of the true label of each instance, we observe the label distribution of the instances associated with a group, where the label distribution is distorted by an unknown noise. Our goals are to (1) estimate the true label of each instance, and (2) learn a classifier that predicts the true label of a new instance. We propose a probabilistic model that considers true label distributions of groups and parameters that represent the noise as hidden variables. The model can be learned based on a variational Bayesian method. In numerical experiments, we show that the proposed model outperforms existing methods in terms of the estimation of the true labels of instances.